home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Turnbull China Bikeride
/
Turnbull China Bikeride - Disc 2.iso
/
STUTTGART
/
LANG
/
ICON
/
ICONV8
/
Docs
/
Tr90-5
< prev
next >
Wrap
Text File
|
1990-07-19
|
53KB
|
1,915 lines
Transporting Version 8 of Icon*
Ralph E. Griswold
TR 90-5c
January 1, 1990; last modified March 29, 1990
Department of Computer Science
The University of Arizona
Tucson, Arizona 85721
*This work was supported by the National Science Foundation under
Grant CCR-8901573.
Transporting Version 8 of Icon
1.__Background
The implementation of the Icon programming language is large
and complex [1]. It is, however, written almost entirely in C,
and it is designed to be portable to a wide range of computers
and operating systems.
The implementation was developed on a UNIX* system. It has
been installed on a wide range of UNIX systems, from mainframes
to personal computers. Putting Icon on a new UNIX system is more
a matter of installation than porting [2]. There presently also
are implementations of Icon for the Amiga, the Atari ST, the
Macintosh, MS-DOS, MVS, OS/2, VM/CMS, and VMS. This document
addresses the problems and procedures for porting Icon to other
operating systems and computers.
The current version of Icon is 8 [3]. All installations of
Version 8 of Icon are obtained from common source code, using
conditional compilation to select system-dependent code. Conse-
quently, transporting Icon to a new system is largely a matter of
selecting appropriate values for configuration parameters, decid-
ing among alternative definitions, and possibly adding some code
that is computer- or operating-system-dependent.
A small amount of assembly-language code is needed for a com-
plete installation. See Section 7. This code is optional and
only affects co-expressions. A running version of the language
can be obtained by working only in C.
Transporting Icon to a new system is a fairly complex task,
although there are many aids to simplify the mechanical portions.
Read this report carefully before beginning a port. Understand-
ing the Icon programming language is helpful during the debugging
phase of a port. See [3-5].
2.__Requirements
C_Data_Sizes
Icon places the following requirements on C data sizes:
__________________________
*UNIX is a trademark of AT&T Bell Laboratories.
- 1 -
+ chars must be 8 bits.
+ ints must be 16, 32, or 64 bits.
+ longs and pointers must be 32 or 64 bits.
+ All pointers must be the same length.
+ longs and pointers must be the same length.
If your C data sizes do not meet these requirements, do not
attempt to transport Icon. Call the Icon Project for advice.
The_C_Compiler
The main requirement for implementing Icon is a production-
quality C compiler that supports at least the de facto ``K&R''
standard [6]. The term ``production quality'' implies robust-
ness, correctness, the ability to handle large files and compli-
cated expressions, and a comprehensive run-time library.
C preprocessor should conform either to the ANSI C standard
[7] or to the de facto standard for UNIX C preprocessors. In
particular, Icon uses the C preprocessor to concatenate strings
and substitute arguments within quotation marks. For the ANSI
preprocessor standard, the following definitions are used:
#define Cat(x,y) x##y
#define Lit(x) #x
For the UNIX de facto standard, the following definitions are
used:
#define Ident(x) x
#define Cat(x,y) Ident(x)y
#define Lit(x) "x"
The following program can be used to test these preprocessor
facilities:
Cat(ma,in)()
{
printf(Lit(Hello world\n));
}
If this program does not compile and print Hello world using one
of the sets of definitions above, there is no point in proceed-
ing. Contact the Icon Project as described in Section 8 for
alternative approaches.
Memory
The Icon programming language requires a substantial amount of
memory to run. The practical minimum is 640Kb.
- 2 -
File_Space
The source code for Icon is large - about 1 Mb. Compilation
and testing require considerably more space. While the implemen-
tation can be divided into components that can be transported
separately, this approach may be painful.
3.__Organization_of_the_Implementation
Icon was developed on a hierarchical file system. To facili-
tate file transfer between different operating systems and to
simplify porting to systems that do not support file hierarchies,
the source code for Icon is provided both in hierarchical form
and in a ``flat'' form in which all files reside in the same
area. This document applies to both the hierarchical and flat
forms. Some of the descriptions that follow refer to file hierar-
chies. In interpreting this documentation for a flat system, sim-
ply ignore the directories in path specifications; the file names
themselves are the same in the hierarchical and flat version.
3.1__Source_Code
There are two components of Icon:
iconta command processor that converts source-language pro-
grams into icode, the ``executable binary'' for the
Icon virtual machine.
iconxan executor for icode, including a run-time system that
supports the operations of the Icon language.
The files related to the source are packaged in four sections:
h headers
icont files for icont
iconx files for iconx
common common files1
In some forms of the diskette distribution, iconx comes in two
parts, since it is is too large to fit on some kinds of
diskettes.
Appendix A lists the files of each component of Icon. Some
header files are used in both components; these are identified in
the appendix. The files icont.bat and iconx.bat are scripts that
indicate what files are to be compiled and loaded to produce the
respective components. These scripts were derived from a UNIX
implementation, but they can be adapted easily to other systems.
__________________________
1Some files are shared by icont and iconx. Others are in
this package for organizational reasons because they are
shared by other programs related to Icon.
- 3 -
4.__An_Overview_of_the_Porting_Process
The first step in the porting process is to configure the
source code for the new system. This process is described in Sec-
tion 5.1. After this is done, icont and iconx need to be con-
structed.
The process for each component is essentially the same:
+ provide code and definitions that are system-dependent
+ compile the source files and link them to produce execut-
able binary files
+ test the result
+ debug, iterating over the previous steps as necessary
icont needs to be ported before iconx, since the output of
icont is needed to test iconx. Of course, bugs in icont may not
show up until iconx is tested.
In addition to this obvious sequence of steps, some aspects of
the implementation may be deferred until the entire system is
running, or they may be implemented in a preliminary manner and
subsequently refined. For example, the assembly-language portion
of iconx is best left unimplemented until the rest of the system
is running.
Considerable frustration can be avoided if problems that come
up can be circumvented with temporary expedients until the major-
ity of the implementation is working properly. Similarly, conser-
vative choices should be made during the initial phases of the
implementation.
5.__Conditional_Compilation
Conditional compilation is used extensively in Icon to select
code that is appropriate to a particular installation. Conceptu-
ally, conditional compilation can be divided into two categories:
(1) Matters related to the details of computer architec-
ture, run-time system idiosyncrasies, specific C com-
pilers, and operating-system variants.
(2) Matters that are specific to operating systems that are
distinctly different, such as MS-DOS, UNIX, and VMS.
5.1__Parameters_and_Definitions
There are many defined constants and macros in the source code
for Icon that vary from system to system. The file h/config.h,
which is included at the beginning of every .c file, manages the
- 4 -
configuration1. It includes h/define.h and, based on the informa-
tion there, provides appropriate definitions, including defaults
for information that is not specified in define.h. It is in
define.h that changes and additions for a specific implementation
need to be made. This file initially contains definitions for a
``vanilla'' 32-bit system. If your system closely approximates
such a system, you will have few changes to make to define.h.
Over the range of possible systems, there are many possibilities
as described below. Do not be intimidated by the large number of
options that follow; only a few are needed for any one implemen-
tation.
The definitions are grouped into categories so that any neces-
sary changes to define.h can be approached in a logical way.
Debugging code: Icon contains some code to assist in debugging.
It is enabled by the definitions
#define DeBugTrans /* debugging code for the translator in icont */
#define DeBugLinker /* debugging code for the linker in icont */
#define DeBugIconx /* debugging code for the executor */
All three of these are automatically defined if DeBug is defined.
DeBug is defined in define.h as it is distributed, so all debug-
ging code is enabled.
The debugging code for the translator consists of functions
for dumping symbol tables (see icont/tsym.c). These functions are
rarely needed and there are no calls to them in the source code
as it is distributed.
The debugging code for the linker consists of a function for
dumping the code region (see icont/lcode.c) and code for generat-
ing a debugging file that is a printable image of the icode file
produced by the linker. This debugging file, which is produced if
the option -L is given on the command line when icont is run,
frequently is useful if problems are encountered in the linker.
See Section 6.
The debugging code for the executor consists of a few validity
checks at places where problems have been encountered in the
past. It also provides functions for dumping Icon values. See
iconx/rmisc.c and iconx/rmemmgt.c.
It usually is advisable to leave the debugging code enabled until
Icon is known to be running properly. The code is innocuous and
adds only a few percent to the size of the executable files. It
should be removed by deleting the definition listed above from
define.h as the final step in the implementation.
__________________________
1 config.h includes <stdio.h>, so you should not include it
elsewhere.
- 5 -
C preprocessor considerations: If your C preprocessor supports
the ANSI draft standard, add
#define StandardPP
to define.h.
C compiler considerations: If your C compiler supports the ANSI C
draft standard, add
#define StandardC
to define.h.
This has several effects. One is to provide a typedef for
pointer that is void * rather than char *. It also enables func-
tion prototypes and the use of the void type for functions that
do not return values.
C library considerations: If your C compiler has an ANSI C draft
standard C library, add
#define StandardLib
to define.h.
Alternatively, if your system has a standard C preprocessor,
compiler, and library, just add
#define Standard
which defines StandardPP, StandardC, and StandardLib.
If your C compiler supports the void type but not the ANSI C
draft standard, add
#define VoidType
to define.h.
If your C compiler supports function prototypes but not the
ANSI C draft standard, add
#define Prototypes
to define.h. This causes function prototypes (in proto.h) to be
used in place of forward declarations. The use of prototypes may
be very helpful in getting Icon to work, especially on systems
with 16-bit ints or unusual pointer representations. (Function
prototypes are produced using a macro, Params(s). See the defini-
tion of Params(s) in h/config.h and examples of its use in
h/proto.h.)
On some systems it may be necessary to provide a different
- 6 -
typedef for pointer than mentioned above. For example, on the
huge-memory-model implementation of Icon for Microsoft C on MS-
DOS, its define.h contains
typedef huge void *pointer
If an alternative typedef is used for pointer, add
#define PointerDef
to define.h to avoid the default one.
Sometimes computing the difference of two pointers causes
problems. Pointer differences are computed using the macro
DiffPtrs(p1,p2), which has the default definition:
#define DiffPtrs(p1,p2) (word)((p1)-(p2))
where word is a typedef that is provided automatically and usu-
ally is long int.
This definition can be overridden in define.h. For example,
Microsoft C for the MS-DOS large memory model uses
#define DiffPtrs(p1,p2) ((word)(p1)-(word)(p2))
If you provide an alternate definitions for pointer differencing,
be careful to enclose all arguments in parentheses.
C sizing and alignment: There are four constants that relate to
the size of C data and alignment:
IntBits (default: 32)
WordBits (default: 32)
Double (default: undefined)
IntBits is the number of bits in a C int. It may be 16, 32, or
64. WordBits is the number of bits in a C long (Icon's ``word'').
It may be 32 or 64. If your C library expects doubles to be
aligned at double-word boundaries, add
#define Double
to define.h.
The word alignment of stacks used by co-expressions is controlled
by
StackAlign (default: 2)
If your system needs a different alignment, provide an appropri-
ate definition in define.h.
Most computers have downward-growing C stacks, for which stack
- 7 -
addresses decrease as values are pushed. If you have an upward-
growing stack, for which stack addresses increase as values are
pushed, add
#define UpStack
to define.h.
Floating-point arithmetic: There are three optional definitions
related to floating-point arithmetic:
Big (default: 9007199254740092.)
LogHuge (default: 309)
Precision (default: 10)
The values of Big, LogHuge, and Precision give, respectively, the
largest floating-point number that does not loose precision, the
maximum base-10 exponent + 1 of a floating-point number, and the
number of digits provided in the string representation of a
floating-point number. If the default values given above do not
suit the floating-point arithmetic on your system, add appropri-
ate definitions to define.h.
Open options: The options for opening files with fopen() are
given by the following constants:
ReadBinary (default: "rb")
ReadText (default: "r")
WriteBinary (default: "wb")
WriteText (default: "w")
These defaults can be changed by definitions in define.h.
Run-time routines: The support for some run-time routines varies
from system to system. The related constants are:
IconGcvt (default: undefined)
IconQsort (default: undefined)
SysMem (default: undefined)
index (default: undefined)
rindex (default: undefined)
If IconGcvt and IconQsort are defined, versions of gcvt() and
qsort() in the Icon system are used in place of the routines nor-
mally provided in the C run-time system. These constants only
need to be defined if the versions of these routines in your
run-time system are defective or missing.
If SysMem is defined and IntBits == WordBits, the C run-time
routines memcpy() and memset() are used in place of the
corresponding Icon routines memcopy() and memfill(). SysMem is
automatically defined if StandardLib is.
- 8 -
Different C compilers use different names for the routines for
locating substrings within strings. The source code for Icon uses
index and rindex. The other possibilities are strchr and strrchr.
If your system uses the latter names, add
#define index strchr
#define rindex strrchr
to define.h.
Similarly, Icon uses unlink for the routine that deletes a
file. The other common name is remove. If your system uses this
name, for example, add
#define unlink remove
to define.h.
Storage management: Icon includes its own versions of malloc(),
calloc(), realloc(), and free() so that it can manage its storage
region without interference from allocation by the operating sys-
tem. Normally, Icon's versions of these routines are loaded
instead of the system library routines.
Leave things are they are in the initial configuration, but if
your system insists on loading its own library routines, multiple
definitions will occur as a result of the ld in src/iconx. If
multiple definitions occur, go back and add
#define IconAlloc
to define.h. This definition causes Icon's routines to be named
differently to avoid collision with the system routine names.
One possible effect of this definition is to interfere with
Icon's expansion of its memory region in case the initial values
for allocated storage are not large enough to accommodate a pro-
gram that produces a lot of data. This problem appears in the
form of run-time errors 305-307. Users can get around this prob-
lem on a case-by-case basis by increasing the initial values for
allocated storage by setting environment variables [8].
Icon's dynamic storage allocation system uses three memory
regions. In some implementations, these regions expand if neces-
sary, allowing memory space to be used in a flexible fashion.
This ``expandable regions'' method relies on the use of brk() and
sbrk() and the system treatment of user memory space as one logi-
cally contiguous region. This method does not work on many sys-
tems that treat memory as segmented or do not support brk() and
sbrk(). On such systems, fixed-sized regions are used. Since
this is the commonest case,
#define FixedRegions
- 9 -
is included in define.h initially. If your system supports brk()
and sbrk(), you may wish to remove this definition in order to
get better utilization of memory. However, since expandable
regions are more prone to problems than fixed regions, it is wise
to start with the latter and try the former only after everything
else is working.
Storage regions: The sizes of Icon's run-time storage regions for
allocated data normally are the same for all implementations.
However, different values can be set:
MaxStatSize (default: 20480 if co-expressions are enabled, else 1024)
MaxAbrSize (default: 65000)
MaxStrSize (default: 65000)
Since users can override the set values with environment vari-
ables, it is unwise to change them from their defaults except in
unusual cases.
The sizes for Icon's main interpreter stack and co-expression
stacks also can be set:
MStackSize (default: 10000)
StackSize (default: 2000)
As for the block and string storage regions, it is unwise to
change the default values except in unusual cases.
Finally, with fixed-regions storage management, a list used
for pointers to strings during garbage collection, can be sized:
QualLstSize (default: 5000)
Like the sizes above, this one normally is best left unchanged.
Allocation size: Normally malloc() is used to allocate space for
Icon's storage regions. This limits region sizes to the value of
the largest unsigned int. Some systems provide alternative allo-
cation routines for allocating larger regions. To change the
allocation procedure for regions, add a definition for AllocReg
to define.h. For example, the huge-memory-model implementation of
Icon for Microsoft C uses the following:
#define AllocReg(n) halloc((long)n,sizeof(char))
Note: Icon still uses malloc() for allocating other blocks. If
this is a problem, it may be possible to change this by defining
malloc in define.h, as in
#define malloc lmalloc
If this is done, and the size of the allocation is not unsigned
int, add an appropriate definition for the type by defining
AllocType in define.h, such as
- 10 -
#define AllocType unsigned long int
It is also necessary to add a definition for the limit on the
size of an Icon region:
#define MaxBlock n
where n is the maximum size allowed (the default for MaxBlock is
MaxUnsigned, the largest unsigned int). It generally is not
advisable to set MaxBlock to the largest size an alternative
allocation routine can return. For the huge-memory-model imple-
mentation mentioned above, MaxBlock is 256000.
File name suffixes: The suffixes used to identify Icon source
programs, ucode files, and icode files may be specified in
define.h:
#define SourceSuffix(default: ".icn")
#define U1Suffix (default: ".u1")
#define U2Suffix (default: ".u2")
#define USuffix (default: ".u")
#define IcodeSuffix (default: "")
#define IcodeASuffix(default: "")
USuffix is used for the abbreviation that icont understands in
place of the complete U1Suffix or U2Suffix. IcodeASuffix is an
alternative suffix that iconx uses when searching for icode files
specified without a suffix. For example, on MS-DOS, IcodeSuffix
is ".icx" and IcodeASuffix is ".ICX".
If values other than the defaults are specified, care must be
taken not to introduce conflicts or collisions among names of
different types of files.
Paths: If icont is given a source program in a directory dif-
ferent from the local one (``current working directory''), there
is a question as to where ucode and icode files should be
created: in the local directory or in the directory that contains
the source program. On most systems, the appropriate place is in
the local directory (the user may not have write permission in
the directory that contains the source program). However, on
some systems, the directory that contains the source file is
appropriate. By default, the directory for creating new files is
the local directory. The other choice can be selected by adding
#define TargetDir SourceDir
Command-line options: The command-line options that are supported
by icont are defined by Options. The default value (see config.h)
will do for most systems, but an alternative can be included in
define.h.
- 11 -
Similarly, the error message produced by icont for erroneous
command lines is defined by Usage. The default value, which
should correspond to the value of Options, is in config.h, but
may be overridden by a definition in define.h.
Environment variables: If your system does not support environ-
ment variables (via the run-time library routine getenv), add the
following line to define.h:
#define NoEnvVars
This disables Icon's ability to change internal parameters to
accommodate special user needs (such as using memory region sizes
different from the defaults), but does not otherwise interfere
with the use of Icon.
Character set: If you are porting Icon to a computer that uses
the EBCDIC character set, add
#define EBCDIC 1
to define.h.
Host identification: The identification of the host computer as
given by the Icon keyword &host needs to be specified in
define.h. The definition
#define HostStr "unspecified host"
is provided in define.h initially. This definition should be
changed to an appropriate value for your system.
Exit codes: Exit codes are determined by the following defini-
tions:
NormalExit (default: 0)
ErrorExit (default: 1)
Memory monitoring: The number of bytes for reporting block sizes
in allocation history files produced by memory monitoring [9] is
determined by
MMUnits (default: WordSize)
A smaller value is needed if the size of any Icon block is not an
even multiple of WordSize. This occurs, for example, on computers
with 80-bit (1-1/2 word) floating-point numbers, in which case
the value of MMUnits should be defined to be 2.
Clock rate: Hz defines the units returned by the times() function
call. Check the documentation for this function on your system.
If it says that times are returned in terms of 1/60 second, no
action is needed. Otherwise, define Hz in define.h to be the
- 12 -
number of times() units in one second.
The documentation may refer you to an additional file such as
/usr/include/sys/param.h. If so, check the value there, and
define Hz accordingly.
Executable Images: If you have a BSD UNIX system and want to
enable the function save(s), which allows an executable image of
a running Icon program to be saved [3], add
Keyboard functions: If your system supports the keyboard func-
tions getch(), getche(), and kbhit(), add
#define KeyboardFncs
to define.h.
System function: If your system supports the system() function
for executing command line, add
#define SystemFnc
to define.h.
Dynamic hashing:
Four parameters configure the implementation of tables and
sets:
HSlots Initial number of hash buckets; it must be a
power of 2
HSegs Maximum number of hash bucket segments
MaxHLoad Maximum allowable loading factor
MinHLoad Minimum loading factor for new structures
The default values (listed below) are appropriate for most
systems. If you want to change the values, read the discussion
that follows.
Every set or table starts with HSlots hash buckets, using one
bucket segment. When the average hash bucket exceeds MaxHLoad
entries, the number of buckets is doubled and one more segment is
consumed. This repeats until HSegs segments are in use; after
that, structure still grows but no more hash buckets are added.
MinHLoad is used only when copying a set or table or when
creating a new set through the intersection, union, or difference
of two other sets. In these cases a new set may be more lightly
loaded than otherwise, but never less than MinHLoad if it exceeds
a single bucket segment.
- 13 -
For all machines, the default load factors are 5 for MaxHLoad
and 1 for MinHLoad. Because splitting or combining buckets
halves or doubles the load factor, MinHLoad should be no more
than half MaxHLoad. The average number of elements in a hash
bucket over the life of a structure is about 2/3xMaxHLoad, assum-
ing the structure is not so huge as to be limited by HSegs.
Increasing MaxHLoad delays the creation of new hash buckets,
reducing memory demands at the expense of increased search times.
It has no effect on the memory requirements of minimally-sized
structures.
HSlots and HSegs interact to determine the minimum size of a
structure and its maximum efficient capacity. The size of an
empty set or table is directly related to HSegs+HSlots; smaller
values of these parameters reduce the memory needs of programs
using many small structures. Doubling HSlots delays the onset of
the first structure reorganization until twice as many elements
have been inserted. It also doubles the capacity of a structure,
as does increasing HSegs by 1.
The maximum number of hash buckets is HSlotsx(2^(HSegs-1)). A
structure can be considered ``full'' when it contains MaxHLoad
times that many entries; beyond that, lookup times gradually
increase as more elements are added. Until a structure becomes
full, the values of HSlots and HSegs do not affect lookup times.
For machines with 16-bit ints, the defaults are 4 for HSlots
and 6 for HSegs. Sets and tables grow from 4 hash buckets to a
maximum of 128, and become full at 640 elements. For other
machines, the defaults are 8 for HSlots and 10 for HSegs. Sets
and tables grow from 8 hash buckets to a maximum of 4096, and
become full at 20480 elements.
Optional features: Some features of Icon are optional. Some of
these normally are enabled, while others normally are disabled.
The features that normally are enabled can be disabled to, for
example, reduce the size of the executable files. A negative form
of definition is used for these, as in
#define NoLargeInts
which can be added to define.h to disable large-integer arith-
metic. It may be necessary to disable large-integer arithmetic on
computers with a small amount of memory, since the feature
increases the size of iconx by 15-20%.
Examine config.h to see what other features can be disabled
and the definitions to use.
One optional feature that normally is disabled is the ability
to call an Icon program from a C function [10]. This feature can
be enabled by adding
- 14 -
#define IconCalling
to define.h.
The implementation of co-expressions requires an assembly-
language routine. Initially, define.h contains
#define NoCoexpr
to disable co-expressions during the initial phases of transport-
ing Icon to a new system. Leave this definition in for the first
round, although you may want to remove it later and implement
co-expressions. (see Section 7).
Search path: The -x option requires knowledge of where to find
iconx. The path is given in paths.h, which contains the follow-
ing as distributed:
#define IconxPath "iconx.exe"
This definition can be changed as needed.
5.2__Operating_System_Differences
Conditional compilation for operating systems usually is due
to differences in run-time library routines, differences in file
naming, the handling of input and output, and environmental fac-
tors.
The presently supported operating system are AmigaDos, Atari
ST TOS, the Macintosh under MPW, MS-DOS, MVS, OS/2, UNIX, and
VM/CMS, and VMS. There hooks for transporting to an unspecified
system (a new port). The associated defined symbols are
AMIGA AmigaDos
ATARI_ST Atari ST TOS
HIGHC_386 MS-DOS in 32-bit protected mode for 80386 processors
MACINTOSH Macintosh
MSDOS MS-DOS
MVS MVS
OS OS/2
PORT new port
UNIX UNIX
VM VM/CMS
VMS VMS
Conditional compilation uses logical expressions composed from
these symbols. An example is:
- 15 -
.
.
.
#if MSDOS
.
. /* code for MS-DOS */
.
#endif
#if UNIX || VMS
.
. /* code for UNIX and VMS */
.
#endif
.
.
.
Each symbol must be defined to be either 1 (for the target
operating system) or 0 (for all other operating systems). This
is accomplished by defining the symbol for the target operating
system to be 1 in define.h. In config.h, which includes define.h,
all other operating-system symbols are automatically defined to
be 0.
Logical conditionals with #if are used instead of defined or
undefined names with #ifdef to avoid nested conditionals, which
become very complicated and difficult to understand when there
are several alternative operating systems. Note that it is
important not to use #ifdef accidentally in place of #if, since
all the names are defined.
The file define.h initially contains
#define PORT 1
Leave it as is; later you should come back and change PORT to
some more appropriate name.
Note: The PORT sections contain deliberate syntax errors (so
marked) to prevent sections from being overlooked during porting.
These syntax errors must, of course, be removed before compila-
tion.
To make it easy to locate all the places where there is code
that may be dependent on the operating system, such code is
bracketed by unique comments of the following form:
- 16 -
/*
* The following code is operating-system dependent.
*/
.
.
.
/*
* End of operating-system specific code.
*/
Between these beginning and ending comments, the code for dif-
ferent operating systems is provided using conditional expres-
sions such as those indicated above.
There presently are a total of 43 segments that contain such
code. The files that contain operating-system-dependent code are
listed in Appendix B. Look through some of the files that con-
tain such segments to get an idea of what is involved. Each seg-
ment contains comments that describe the purpose of the code. In
some cases, the most likely code or a suggestion is given in the
conditional code under PORT. In some cases, no code will be
needed. In others, code for an existing system may suffice for
the new system.
In any event, code for the new operating system name must be
added to each such segment, either by adding it to a logical dis-
junction to take advantage of existing code for other systems, as
in
#if MSDOS || UNIX || PORT
.
.
.
#endif
#if VMS
.
.
.
#endif
and removing the present code for PORT or by filling in the seg-
ment with the appropriate code, as in
#if PORT
.
. /* code for the the port */
.
#endif
If no code for the target operating system, a comment should be
added so that it is clear that the situation has been considered.
You may find need for code that is operating-system dependent
- 17 -
at a place where no such dependency presently exists. If the
situation is idiosyncratic to your operating system, which is
most likely, simply use a conditional for PORT as shown above.
If the situation appears to need different code for several
operating systems, add a new segment similar to the other ones,
being sure to provide something appropriate for all operating
systems.
Do not use #else constructions in these segments; this
encourages errors and obscures the mutually exclusive nature of
operating system differences.
6.__Building_and_Testing
6.1__The_Command_Processor
Start by compiling all the C programs listed in icont.bat.
Link the resulting object files to produce icont. If you
encounter problems, first check the portions of code containing
operating system dependencies.
Once you have a version of icont, try it on the Icon programs
in tests. For example, to translate hello.icn in tests, do
icont -c hello.icn
The -c option stops icont at the point it produces ucode
files, which are an intermediate form of virtual machine code.
This should yield two ucode files, hello.u1 and hello.u2. The
.u1 file contains procedure declarations and code for the Icon
machine; the .u2 file contains global declaration information.
These files both consist of printable text. They should be
identical to the corresponding files in test/stand unless the
EBCDIC character set is used in the port.
Checking icode files is next. Since icode files are binary and
vary somewhat from system to system, they cannot be checked as
easily as ucode files. However, as mentioned in Section 5.1, if
icont is compiled with the linker debugging code enabled, the -L
command-line option produces a printable image in a file with
suffix .ux. For example,
icont -L hello.u1
produces an icode image hello.ux. Compare this to the
corresponding file in tests/stand. Remember that differences are
to be expected and the check is only a rough one.
6.2__The_Executor
If you get this far without apparent problems, you are ready
for the next part of the transporting process: iconx. Compile
- 18 -
all the C programs listed in iconx.bat and load them to form
iconx.
As a first test, try iconx on hello.icn in tests as follows:
icont hello.icn
iconx hello
If all is well, the last step should print out "hello world" and
some identifying information. If it doesn't, the problem may be
in either icont or iconx.
Once this test has been passed, more rigorous testing should
follow. At this point, you probably will want to devise a way of
testing programs, since there are a large number of tests. This
is done for the UNIX implementation using the following script:
for i in `cat $1.lst`
do
rm -f local/$i.out
echo Running $i
icont -s $i.icn
if test -r $i.dat
then
iconx $i <$i.dat >local/$i.out 2>&1
else
iconx $i >local/$i.out 2>&1
fi
echo Checking $i
diff local/$i.out stand/$i.out
rm -f $i
done
Something similar can be concocted for most other systems. Making
such a facility as easy to use as possible is worth the effort.
There are many test programs for testing different aspects of
iconx. These range from simple tests to ``grinders''. The names
of the test programs are listed in the following files:
check.lst tests whose results differ from system to systems
coexpr.lst tests that use co-expressions
expr.lst tests that contain a wide variety of expressions
float.lst tests that test floating-point arithmetic
gc.lst tests of garbage collection
icon.lst short but varied tests
large.lst tests of large-integer arithmetic
model.lst tests of features that depend on hashing parameters
new.lst tests of new features
other.lst tests of more complex programs
There are data files for all test programs, although some data
files are empty. The names of data files correspond to the names
- 19 -
of the Icon programs but end in .dat. For example, the Icon pro-
gram meander.icn, listed in icon.lst, takes data from
meander.dat. tests/stand contains files whose names end in .out
that contain the expected output of each test program. For exam-
ple, the expected output of meander.icn is contained in
meander.out.
Start with icon.lst. The output should be identical to that in
the distributed .out files. Any discrepancies should be checked
carefully and corrections made before continuing.
The programs listed in expr.lst execute a wide variety of
individual expressions. Ideally, there should be no discrepancies
between their output and the expected output. If there are many
discrepancies, something serious probably is wrong. If there are
only a few discrepancies, they may be noted while other testing
is conducted.
The program listed in check.lst certainly will show some
differences, since they test features whose results are time- and
environment-dependent.
The programs listed in other.lst and new.lst test some
features that are not tested elsewhere. They should be treated
like the programs listed in icon.lst.
The programs listed in float.lst are likely to show many
differences, since the routines that convert floating-point
numbers to strings vary widely from system to system. It is
enough to check that the numerical magnitudes are correct.
The program listed in model.lst shows differences if run on a
system that has 16-bit ints or if hashing parameters are altered.
Since storage management is one of the parts of Icon that is
likely to give trouble, there are special storage-management
tests in gc.lst. These programs run for a long period of time.
One program may show a difference in output if the fixed-regions
version of memory management is used, since it may run out of
space.
The programs in large.lst require large-integer arithmetic.
Run these tests if that feature is supported.
The programs in coexpr.lst require co-expressions. Save them
for later.
Not much general advice can be given about locating and
correcting problems that may show up in testing iconx. It has to
be done the hard way and may involve learning more about the Icon
language [4] and how it is implemented [1]. A good debugger can
be very helpful.
If your system can produce core dumps that are useful for
- 20 -
debugging, set the environment variable ICONCORE. This will cause
iconx to produce a code dump on abnormal termination.
7.__Co-Expressions
Once Icon is running satisfactorily, you may wish to implement
co-expressions. This requires an assembly-language routine.
Note: If your system does not allow the C stack to be at an
arbitrary place in memory, there is probably little hope of
implementing co-expressions. If you do not implement co-
expressions, the only effect will be that Icon programs that
attempt to use a co-expression will terminate with an error mes-
sage.
All aspects of co-expression creation and activation are writ-
ten in C in Version 8 except for a routine, coswitch, that is
needed for context switching. This routine requires assembly
language, since it must manipulate hardware registers. It either
can be written as a C routine with asm directives or as an assem-
bly language routine.
Calls to the context switch have the form
coswitch(old_cs,new_cs,first), where old_cs is a pointer to an
array of words (C longs) that contain C state information for the
current co-expression, new_cs is a pointer to an array of words
that hold C state information for a co-expression to be
activated, and first is 1 or 0, depending on whether or not the
new co-expression has or has not been activated before. The
zeroth element of a C state array always contains the hardware
stack pointer (sp) for that co-expression. The other elements can
be used to save any C frame pointers and any other registers your
C compiler expects to be preserved across calls.
The default size of the array for saving the C state is 15.
This number may be changed by adding
#define CStateSize n
to define.h, where n is the number of elements needed.
The first thing coswitch does is to save the current pointers
and registers in the old_cs array. Then it tests first. If first
is zero, coswitch sets sp from new_cs[0], clears the C frame
pointers, and calls interp. If first is not zero, it loads the
(previously saved) sp, C frame pointers, and registers from
new_cs and returns.
Written in C, coswitch has the form:
- 21 -
/*
* coswitch
*/
coswitch(old_cs, new_cs, first)
long *old_cs, *new_cs;
int first;
{
.
.
.
/* save sp, frame pointers, and other registers in old_cs */
.
.
.
if (first == 0) { /* this is first activation */
.
.
.
/* load sp from new_cs[0] and clear frame pointers */
.
.
.
interp(0, 0);
syserr("interp() returned in coswitch");
}
else {
.
.
.
/* load sp, frame pointers, and other registers from new_cs */
.
.
.
}
}
After you implement coswitch, remove the #define NoCoexpr from
define.h.
To test your context switch, run the programs in coexpr.lst.
Ideally, there should be no differences in the comparison of out-
puts.
If you have trouble with your context switch, the first thing
to do is double-check the registers that your C compiler expects
to be preserved across calls - different C compilers on the same
computer may have different requirements.
Another possible source of problems is built-in stack check-
ing. Co-expressions rely on being able to specify an arbitrary
region of memory for the C stack. If your C compiler generates
- 22 -
code for stack probes that expects the C stack to be at a
specific location, you may need to disable this code or replace
it with something more appropriate.
8.__Trouble_Reports_and_Feedback
If you run into problems, contact us at the Icon Project:
Icon Project
Department of Computer Science
Gould-Simpson Building
The University of Arizona
Tucson, AZ 85721
U.S.A.
(602) 621-4049
icon-project@cs.arizona.edu (Internet)
... {uunet, allegra, noao}!arizona!icon-project (uucp)
Please also let us know of any suggestions for improvements to
the porting process.
Once you have completed your port, please send us copies of
any files that you modified so that we can make corresponding
changes in the central version of the source code. Once this is
done, you can get a new copy of the source code whenever changes
or extensions are made to the implementation. Be sure to include
documentation on any features that are not implemented in your
port or any changes that would affect users.
Acknowledgements
Many persons have been involved in the implementation of Icon.
Contributions to its portability have been made by Mark Emmer,
Bill Mitchell, Gregg Townsend, Ken Walker, and Cheyenne Wills.
References
1. R. E. Griswold and M. T. Griswold, The Implementation of the
Icon Programming Language, Princeton University Press, 1986.
2. R. E. Griswold, Installation Guide for Version 8 of Icon on
UNIX Systems, The Univ. of Arizona Tech. Rep. 90-2, 1990.
3. R. E. Griswold, Version 8 of Icon, The Univ. of Arizona
Tech. Rep. 90-1, 1990.
4. R. E. Griswold and M. T. Griswold, The Icon Programming
Language, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.
- 23 -
5. R. E. Griswold, An Overview of Version 8 of the Icon
Programming Language, The Univ. of Arizona Tech. Rep. 90-6,
1990.
6. B. W. Kernighan and D. M. Ritchie, The C Programming
Language, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1978.
7. Technical Committee X3J11, Draft Proposed American National
Standard for Information Systems - Programming Language C,
1988.
8. R. E. Griswold, ICONT(1), manual page for UNIX Programmer's
Manual, The Univ. of Arizona Icon Project Document IPD109,
1990.
9. G. M. Townsend, The Icon Memory Monitoring System, The Univ.
of Arizona Icon Project Document IPD113, 1990.
10. R. E. Griswold, Icon-C Calling Interfaces, The Univ. of
Arizona Tech. Rep. 90-8, 1990.
- 24 -
Appendix A - Files Used for Components of Icon
Files marked by * are used in more than one component.
Files_Used_for_icont
config.h* general configuration information
cproto.h* function prototypes
cpuconf.h* processor configuration information
define.h* system-dependent definitions
fdefs.h* function definitions
general.h general header information
globals.h global declarations
header.h* icode header structure
keyword.h* keyword definitions
lfile.h information for link declarations
link.h heading information for the linker
odefs.h* operator definitions
opcode.h opcode structure
opdefs.h* icode instruction definitions
paths.h* file paths
proto.h* function prototypes
rt.h* header for run-time system
sizes.h data sizing
tlex.h information for lexical analysis
token.h token definitions
tproto.h function prototypes
trans.h heading information for the translator
tree.h code tree information
tsym.h information for symbol tables
version.h* version information
ebcdic.c EBCDIC conversion routines
err.c error messages
getopt.c command-line processing routines
keyword.c keyword structure
lcode.c linker code generator
lglob.c processor for global linking information
link.c linker
llex.c lexical analyzer
lmem.c linker memory management
long.c* long-string routines
lnklist.c file linking
lsym.c linker symbol table management
opcode.c opcode table
optab.c state tables for operator recognition
parse.c parser
tcode.c translator code generator
tlex.c lexical analyzer for translation
tlocal.c local routines
tmain.c main program
tmem.c memory management for translation
toktab.c token table
- 25 -
trans.c translator
tree.c code tree constructor
tsym.c translator symbol table management
util.c utility routines
Files_Used_for_iconx
config.h* general configuration information
cproto.h* function prototypes
cpuconf.h* computer configuration information
define.h* system-dependent definitions
fdefs.h* function definitions
gc.h garbage collection definitions
header.h* icode header
keyword.h* keyword definitions
memsize.h* memory sizing
odefs.h* operator definitions
opdefs.h* icode definitions
proto.h* function prototypes
rproto.h* function prototypes
rt.h* run-time definitions
version.h* version information
extcall.c external function stub
fconv.c conversion functions
fmath.c math functions
fmemmon.c memory-monitoring functions
fmisc.c miscellaneous functions
fscan.c scanning functions
fstr.c string construction functions
fstranl.c string analysis functions
fstruct.c data structure functions
fsys.c system functions
fxtra.c extra functions
idata.c data
imain.c main program
interp.c icode interpreter
invoke.c function and procedure invocation
istart.c main program for calling Icon from C
lmisc.c miscellaneous library routines
long.c* long-integer routines
lrec.c library routines for record
lscan.c scanning routines
memory.c memory-mangement routines
oarith.c arithmetic operations
oasgn.c assignment operations
ocat.c concatenation operations
ocomp.c comparison operations
omisc.c miscellaneous operations
oref.c referencing operations
oset.c set operations
ovalue.c value operations
time.c time and date routines
rcomp.c comparison routines
rconv.c conversion routines
- 26 -
rdebug.c debugging routines
rdefault.c default value routines
rdoasgn.c assignment routines
rlocal.c local routines
rlargint.c large-integer routines
rmemexp.c memory management routines for expandable regions
rmemfix.c memory management routines for fixed regions
rmemmgt.c general memory management routines
rmisc.c miscellaneous routines
rstruct.c structure routines
rsys.c system routines
- 27 -
Appendix B - System-Dependent Code
The following source files contain code that is operating-
system dependent. The number of places where such code occurs in
each file is given in parentheses.
h:
config.h (1)
proto.h (1)
rt.h (1)
icont:
link.c (3)
lmem.c (4)
tlocal.c (1)
tmain.c (4)
util.c (1)
iconx:
fmath.c (1)
fsys.c (6)
imain.c (6)
interp.c (4)
rconv.c (1)
rlocal.c (1)
rmemexp.c (1)
rmisc.c (1)
common:
time.c (6)
- 28 -